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METHOD AND SYSTEM OF MARKING A TEXT DOCX3MENT 
WITH A PATTERN OF EXTRA BLANKS 
FOR AUTHENTICATION 



Field of lihe lnvent;lon 

5 The present invention relates to the field of document 

authentication- It is more specifically concerned with the 
authentication of soft or hard copies of plain text documents 

Background of the Invention 

In the current environment of computer networks charac- 
10 terized by an exponential growth in the circulation of soft- 
copy or electronic text documents such as e-mail over 
unsecured media e.g., the Internet this, combined with the 
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possibility for any one of easily printing and photocopying a 
hard-copy of the same text documents, a key issue is authenti- 
cation. It should be possible for the recipient of a text 
document, be it an electronic message or a hard copy of it, to 
5 make sure of its origin so that no one should be able to 
masquerade as someone else. Also, it should be possible to 
verify it has not been modified, accidentally on maliciously, 
en route. To this end methods have been devised to perform 
authentication . 

10 The standard solution, which fits well with electronic 

text documents, consists in adding a MAC or Message Authenti- 
cation Code to soft-copy text documents- A MAC is a digest 
computed with a one-way hash function over the text and which 
is also made dependent on a key e.g., a secret-key known only 



15 to the sender and the receiver in order this latter can check 
first, that what it received has well been originated by whom 
shares the secret-key with it and second, that the document 
has not been altered. For example. Secure Hash Algorithm or 
SHA specified by the National Institute of Standards and 

20 Technologies, NIST, FIPS PUB 180-1, "^Secure Hash Standard", US 
Dpt of Commerce, May 93, produces a 160-bit hash. It may be 
combined with a key e.g., through the use of a mechanism 
referred to as HMAC or Keyed-Hashing for Message Authentica- 
tion, subject of the RFC (Request For Comment) of the IETF 



25 (Internet Engineering Task Force) under the number 2104, HMAC 
is devised so that it can be used with any iterative crypto- 
graphic hash function thus, including SHA. Therefore, a MAC 
can be appended to the soft-copy of a text document so as the 
whole can be checked by the recipient. Obviously, this method 

30 does not work on hard-copy text documents since it assumes the 
addition of checking information to a file. Moreover, this 
scheme has the inconvenience of indeed separating text and 
checking information. Thus, this latter can easily beT isolated 
and removed intentionally, in an attempt to cheat, or acciden- 

35 tally just because intermediate pieces of equipment in charge 
of forwarding the electronic documents are not devised to 



T 
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manipulate this extra piece of information. Then, the checking 
information should rather be encoded transparently into the 
body of the text document itself i.e., in a manner that does 
not affect text readability whatsoever, so that it remains 
5 intact across the various manipulations it is exposed to on 
its way to destination still enabling the end- recipient to 
authenticate the document. . 

Another type of approach to authentication which applies 
mainly to soft-copy images (which thus may also be used on the 
10 image of a hard-copy text document still failing to work 

directly from hard-copy though) consists in hiding data into 
their digital representation therefore, meeting the above 
requirement that checking information should better be merged 
into the document itself. Data hiding has received a consider- 



15 able attention mainly because of the copyrights attached to 
digital multimedia materials which can easily be copied and 
distributed everywhere through the Internet and networks in 

' general. A good review of data hiding techniques is in 

^Techniques for data hiding' by W. Bender and al. published in 

^20 the IBM Systems Journal, Vol. 35, Nos 3&4, 1996. As an illus- 
tration to the way data hiding may be carried out the most 



common form of high bit-rate encoding, reported in here above 
paper, is the replacement of the least significant luminance 
bit of image data with the embedded data. This technique which 

25 indeed meets the requirement of being imperceivable (the 
restored image is far to be altered to a point where this 
would become noticeable) may serve various purposes, similar 
to authentication including watermarking, aimed at placing an 
indelible mark on an image or tamper-proofing, to detect image 

30 alterations especially, through the embedding of a MAC into 
the soft-copy image. 

However, having to consider a text as an image would be a 
very costly and inadequate solution in term of storage and 
bandwidth necessary to transmit it. Although, as stated in 
35 here above paper, soft-copy text is in many ways the most 
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difficult place to hide data due to the lack of redundant 
information in a text file as compared to a picture the 
manipulation of white spaces i*e*, blank characters and more 
specifically inter-word blank characters purposely inserted by 
5 the originator of a text document, in excess of what is neces- 
sary to make a text readable, is the most simple way of 
marking a text that is susceptible to be authenticated without 
the addition of a separated MAC since the information neces- 
sary for the checking is then imbedded, somehow hidden, into 
10 the text itself, under the form of blanks, that the casual 
reader is unlikely to take notice of. 

Therefore it is an object of the invention to provide a 
method to merge the information necessary to authenticate a 
text document into the body of the document itself. 



Further objects, features and advantages of the present 
invention will become apparent to the ones skilled in the art 
upon examination of the following description in reference to 
20 the accompanying drawings. It is intended that any additional 



15 



It is another object of the invention to have this method 
applicable to both soft-copy and hard-copy text documents. 



advantages be incorporated herein. 
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Sunmajry of tiie Xnvent:lon 



A method and a system of marking a text document through 
the insertion of inter-word blank characters are disclosed. 
The method first consists in editing the number of inter-word 
5 blank characters of the text document in order to conform to a 
model so that to obtain a canonical text document. Then, from 
the canonical text docioment, to further conform to the model, 
' a subset of positions of the inter-word blank characters is 

retained in which insertion of blank characters is permitted. 
' 0 After which, using the canonical text document and a secret- 
key as inputs, a unique combination of positions, among the 
above subset of positions, is computed. Into each position of 
^ the unique combination of positions just computed at least one 

extra blank character is inserted thus, obtaining a marked 
5 text document. The same method also applies to a received 

marked text document to be authenticated by a recipient 
^ sharing the secret-key however, further including a comparison 
of the received text docxament to the marked text document so 
that if they are matching exactly the received text document 
0 is accepted as authentic. If not, it is rejected as fake. 

Therefore the invention provides a method and a system to 
V merge the information necessary to authenticate a text 

document into the body of the document itself, through the 
insertion of extra blanks that the casual reader is unlikely 
'5 to take notice of, and which works as well on soft-copy and 
hard-copy text documents - 



( 
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Brief Description o£ "the Drawings 



Figure 1 is an overall description of the invention which 
allows to authenticate a text through the inser- 
tion of extra blanks. 

Figure 2 discusses what is possibly a canonical text and 
how the invention may also apply to a hard-copy 
text . 

Figure 3 is an overall description of the main step of the 
invention, output of which is a marked text which 
can be authenticated. 

Figure 4 is one example of how to generate a unique seed 

which is both dependent on the text and a shared- 
secret key. 

Flgure 5 is one example of a pseudo-random-number generator 
aimed at producing a stream of numbers to insert 
blanks between words of the text. 
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De'baxled Description o£ the Preferred Embodiment 



In the following description of figures, at several 
occasions, a reference is made to ^Applied Cryptography' a 
book authored by Bruce Schneier and published by John Wiley & 
5 Sons, 2nd edition, 1996. References will be abbreviated as 

[SCH] and may include a particular chapter e.g.: [SCH/Ch.l8], 

^ Figure 1 describes the overall process per the invention 

in order to obtain a marked text for authentication. Process 
starts from an original text ^oText' [100] that must be marked 
0 so that the recipient of the message can verify it has indeed 
been marked by whoever shares, with the recipient, a common 

\ secret-key [130] while making sure at the same time that the 

text has not been altered, accidentally or maliciously, on its 
way- Original text [100] generally contains extra blanks added 
5 for typographic reasons. For example, often, two blanks are 
following a full stop like in [101] . Then, the first process- 
ing step [110] consists in removing all unnecessary blanks 

' from the original text thus, leaving only one blank between 
any two words. The result of this is the canonical form of 

^0 ^oText' i.e., ^cText' [120] in which no two words are 

\ separated by more than one blank. Yet very simple this way of 
obtaining a canonical text is not the only one possible - 

f Figure 2 hereafter further elaborates on what a canonical text 

could possibly be. Although essential to the invention this 
^5 first step is however considered as being straightforward to 
implement. Apart from figure 2, which considers some possible 
alternatives for the canonical text, this will not be further 



discussed. Especially, it will be assumed, in the rest of this 
description of the invention, that obtaining ^cText' is a 

^0 trivial operation for the ones skilled in the art. The second, 
far less trivial, processing step [140] consists in uniquely 

I marking ^cText' by inserting some extra blanks [151] so as to 
obtain a marked text ^mText' [150] . Inputs to this second 
processing step [140] are 'cText' [120] on one hand and a 
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shared secret-key [130] on the other hand. A preferred embodi- 
ment of this second step is described at length in following 
figures. However, the whole intent of it is to produce a 
^mText' which is unique being given ^cText' and the 
5 secret-key. In other words, no one, not sharing the 

secret-key, should be able to forge easily an ^mText' from a 
fake or altered ^cText' that is likely to be accepted by the 
recipient which, applying the same overall process, will not 
obtain the same pattern of extra blanks [151] thus, permitting 
10 to indeed decide that either text has been altered or the 

sender is not who it pretends to be. Obviously, the insertion 
of extra blanks at step [140] must be such that it is unfeasi- 
ble (i.e., in practice, computationally very difficult or long 
with today's available computing resources) to retrieve the 



As briefly mentioned herein above sender and receiver 
apply basically the same process to mark a text or to check it 
on reception. First step [110] and second step [140] are 
identical. The only difference is that sender applies the 

20 first step from the original text ^oText' while receiver uses 
the received ^mText' . Both are producing a same ^cText' if 
^mText' has not been altered. Recipient authenticates received 
^mText' because it is able to reconstruct the same ^mText' 
from ^cText' i.e., with the same number of extra blanks 

25 inserted in the same positions so as when compared [160] both 
match exactly [161] or comparison fails [162] in which case 
the received text is rejected. 

Finally, to be of practical value the invention requires 
that the probability of producing randomly the same ^mText' 

30 should be extremely low (so as no collision may likely occur 
that would open the door to an attack that would eventually 
result in the discovery of the secret-key) . In the example of 
this figure ^oText' is, for practical purposes, a rather short 
text comprised of 72 words thus, ^cText' has 71 inter-word 

35 blanks that is, there are 71 opportunities to insert extra 
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secret-key from ^mText' . 
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blanks. If, however, one wants to limit the number of inserted 
blanks to, e.g., 10% of the total number of required inter- 
word blanks so as a casual reader is unlikely to take notice 
of them (especially if the font in use is proportional on the 
5 contrary of the example in figure 1 which uses a fixed-pitch 
^courier' font in order to illustrate simply the mechanism of 
the invention) then, no more than 7 extra blanks should be 
inserted in the text of this particular example. This corre- 
^ sponds to a probability lower than 1 over 10^ and more 

10 precisely: 1 -^(Cti +C71 +C71 +C71 )- This might not be suffi- 
cient though to defeat a serious attack thus, suggesting that 
the invention is better fitted for longer texts with more 
\ opportunities to insert blanks unless to accept of inserting 
\ more than 10% of blanks especially, in case of rather short 



15 texts, like the one of figure 1, in which probability could be 
lowered e.g., roughly by another order of magnitude just by 
permitting one more extra blank (8 instead of 7) to be 
inserted. Alternatively, a solution for a short text is to pad 
it with e.g., a banner, some mail closing information, a 

0 disclaimer or a warning text. These are practical necessary 

tradeoffs which must be considered in the various applications 
to which invention applies. 

Figure 2 discusses what is possibly a canonical text 
since it does not exist such a thing as a unique definition of 

5 what a canonical text should ideally be. If the example of 

figure 1 assumes that a single blank only is left between any 
two words (a broad definition of a word here is any number of 
non-blank characters in between two blanks characters) this 
simple approach has shortcomings. The recipient of such a 

) soft-copy text has no certain way to reformat the original 

text since it does not know where lines break. And, if this is 
a hard-copy text it cannot decide how many extra blanks would 
be present at the end of each line. 
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As far as hard-copy texts are concerned, mention must be 
made that the invention assumes the use of a convenient 
optical device able to discriminate the number of blanks 
actually inserted into the text. If counting inserted blanks 
5 through a simple visual inspection of a hard-copy text is 

certainly feasible with fixed-pitch fonts this may become very 
difficult, if not impossible, when the font used to print the 
text is non proportional. Anyway, this would be, in both 
cases, a very cumbersome error prone job. Thus, the applica- 
10 tion of the invention to hard-copy texts such as [200] 

requires, in the general case, the use of an appropriate tool 
set comprised of an optical device and associated computing 
resources so as a soft copy of a text can be recovered 
automatically with the right number of blanks inserted between 
15 words. Apparatus and software to achieve that are available 
nowadays. Software programs broadly referred to as OCR 
(Optical Character Recognition) running e.g., on a PC 
(Personal Computer) [215] controlling an Optical Scanner [210] 
are commercially available and widely used. 

20 Therefore, an example of another definition of a canoni- 

cal text [220], a little bit more sophisticated than the one 
of figure 1, assumes there is still only one blank between 
words e.g., [230] while three blanks end a line e.g., [240]. 
Then, marking a text would be done, as in figure 1, by insert- 

25 ing one extra blank in selected positions however, always 

excluding the end of lines (three blanks) so that this form of 
canonical text becomes compatible both with soft-copy text and 
hard-copy text yet permitting that the lines of the original 
text be retrieved from the soft-copy marked text. A minor 

30 drawback being that there are less opportunities left to 

insert extra blanks since end of line [240] inter-words are 
excluded (so as hard-copy can be handled) . 

Clearly, in the light of this second example, there are 
many possibilities to define a set of rules to obtain a 
35 canonical text which must be agreed on by all those involved 
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and must be such that creating ^cText' is always unambiguous. 
That is, being given an original text ^oText' or a received 
^mText' it must always exist one possibility, and only one, 
for the corresponding ^cText' . Depending on the complexity of 

5 these rules it may not always be possible to preserve the 

compatibility between hard and soft copies of a text though • 
Especially, text formatted from a word processor inserting 
automatically extra spaces between words so to justify a text 
(i.e., formatting a text in order it is left and right justi- 

0 fied [250]) no longer permit, from the hard-copy text, to 

discriminate the number of ^real' blanks e.g., [255] inserted 
in the text by its originator from the extra spaces which may 
have been inserted by the word processor itself e.g., [260]. 



5 encoded with extra blanks, can possibly be handled by the 
invention however, with the important advantage that, when 
displayed or printed, text even better hide the blanks that 
have been purposely added by the originator of the text and 
which become more indiscernible from spaces added by the text 
0 formatter. In this example extracted from a paper by Daniel X. 
Le, dated November 18, 1997 and untitled * Document Imaging 
Software Toolkits, Computer-Assisted Zoning Software, the OCR 
Voting Machine, and OCR verification Software' available on LD 
Technologies Inc. WEB site at http://www.ldtechnologies.com/ 

,^5 the text has been formatted using Word97"*, the well-known text 
processor trade mark of Microsoft Corporation, One Microsoft 
Way, Redmond, WA 98052, the USA. This example shows clearly 
inserted blanks [255] displayed (along with other non print- 
able characters like [265] ) under the form of small dots (not 

30 printing and not normally displayed) plus the added spaces 
such as [260] to obtain a left and right justified text. 



In this latter example only a soft copy of a text. 
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Figure 3 focuses more on the second processing step of 
the invention i.e., step [140] of figure 1- Although there are 
many equivalent alternatives to carry it out, figure 2 depicts 
an overall preferred embodiment [34 0] whose details of imple- 



following figures. This step mostly calls for the use of 
standard techniques known from the art of cryptography 
(limitations of which have been thoroughly studied and results 
available in the abundant literature on the subject) so as to 

10 produce a 'mText' according to what was described in figure 1. 
Thus, second processing step may start [300] when a secret-key 
[330] is available and a canonical text ^cTexf [320] is made 
ready. Then, first sub-step [342] is performed. The output of 
it is a fixed-length keyed message digest representative, on 

15 one hand, of ^cText' , irrespective of its actual length, and 
of the secret-key on the other hand. That is, sub-step [342] , 
is aimed at supplying a seed [344] to the next sub-step i.e., 
the PRN (pseudo-random-number) generator [346], Seed [344] ^ 
result of sub-step [342], should ideally be such that, for a 

20 given secret-key, no two ^cText' ever output the same value so 
not only two completely different texts are expected to always 
produce different values but the faintest alteration to a 
certain ^cText' should also return a different value. In 
practice this means that the function used in sub-step [342] 

25 must not be statistically biased and computed digest i.e., the 
seed [344] made wide enough so that the probability of finding 
two texts returning the same value be extremely low. 

Moreover, the seed is made dependent of the secret-key 
[330] so that it is also different if produced from the same 

30 ^cText' however, from a different secret-key. Those skilled in 
the art will recognize that obtaining such a seed is closely 
related to obtaining a MAC (Message Authentication Code) which 
assumes generally the use of a standard technique in cryptog- 
raphy that is, hashing and more specifically with MAC, one-way 

35 hashing. Indeed, available one-way hash functions are such 
that the probability of having two messages hashing to the 
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mentation are further described through examples in the 



same value is very unlikely. Also, if one bit changes in the 
input text then, on the average, half of the bits of the hash 
result flip and it is practically unfeasible to find a text 
that hashes to a given value. This latter feature of one-way 
5 hash functions is however further discussed in the following 
since, if it does not hurt, it may not be essential to carry 
out the invention thus, allowing to somehow simplify this 
step. 

Many hashing algorithms are available, A good overview of 
vO them can be found in [SCH/Ch.l8]. After (or while) a digest of 
'cText' has been (is) obtained it must be combined, in one way 
or another, with the shared secret-key [330] thus, obtaining a 
MAC so that only the ones sharing the secret-key are able to 
I verify the hash thus, providing authenticity without secrecy, 
^5 Therefore, the standard practice for authentication of a soft- 
copy text is to append a MAC to the corresponding text file so 
as the recipient can check it while the invention assumes it 
is used instead as a seed to a PRN generator [346] from which 
numbers are generated and utilized at sub-step [348] to decide 
0 where blanks must be inserted into ^cText' in order to obtain 
a marked text ^mText' that can be authenticated as it was 
explained in figure 1. How many random numbers are to be 
^ generated is directly dependent on what is the acceptable 

probability of obtaining the same ^mText' purely by chance. 
^5 Depending on the particular application of the invention this 
will be an input parameter. 

Pseudo-random-number generators are widely used in 

cryptography and have received considerable attention. Again, 

a good review can be found in [SCH/Ch. 16&17] while a much more 

30 theoretical approach to random numbers is e.g., in ^The Art of 

f Computer Programming' by Donald E. KNUTH, Chapter 3, Volume 2, 

Addison-Wesley . Thus, a convenient pseudo-random generator can 

be chosen for a particular application of the invention. An 

; example is given in figure 5 hereafter. Whichever PRN genera- 

,5 tor is selected it should preferably be such that it is 
f 
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computationally very hard, if not unfeasible, starting from 
the pattern of extra blanks, to be able to backtrack to the 
seed [344] * Provided this is true then, less constraints can 
be put over step [342] . As previously suggested one-wayness of 
5 this step becomes far less critical. The main objective of it, 
which remains to hold, is that no two ^cText' [320] should 
ideally return the same seed with an identical secret-key. 
With this approach, seed [344] needs only to be a unique 
representation of ^cText' somehow combined with the secret key 
10 since it is assumed to be impossible or very hard to backtrack 
from the pattern of extra blanks to the seed it becomes 
unimportant to be able to retrieve easily the secret-key from 
the seed. 

Although the following figures are more particularly 
15 aimed at illustrating this preferred embodiment of the inven- 
tion it is however, as well, possible to make other choices 
better suited to a particular application of the invention. 
Those skilled in the art will recognize that it is also possi- 
ble e.g., to choose a weaker PRN generator while putting 
20 mostly the difficulty of retrieving the secret-key and ensur- 
ing one-wayness into step [342] instead, as standard computa- 
tion of MACS assume. 

Flguxre 4 gives one example on how a digest can be derived 
from a ^cText' and a secret-key and used as a seed to a PRN 
25 generator. Again, this example assumes that being able to 

backtrack from seed to secret-key at this step is unimportant 
since one-wayness is rather ensured by the PRN generator, an 
example of which is given hereafter in figure 5. 

Therefore, a computationally simple method for generating 
30 a seed that can be considered is similar to the way a popular 
data compression program known under the name of PKZIP™ (a 
product of PKIWARE, Inc., 9025 Deerwood Drive, Brown Deer, 
WI 53223, USA having a WEB site at http://www.pkware.com/) 
performs encryption. This is built around a degree-32 CRC 
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(Cyclic Redundancy Checking) irreducible polynomial referred 
to as CRC-32 [420] in the following. More on this can be found 
in [SCH/Ch* 16 . 12] . Then, basically, like with CRC, the method 
consists in dividing the stream of bits resulting of the 
5 conversion of the text characters [400] into their 8-bit 

binary equivalent by the here above CRC-32 polynomial. That 
is, each text character is assumed to be e.g., coded in ASCII 

[410] • The remainder of such a division, which is performed 
modulo 2 at bit level (with simple XORs) and modulo the CRC-32 
lO polynomial (a 33-bit vector), is at most a 32-bit wide vector 

[430] • To combine the secret key, so as to obtain a seed which 
is a function of both text and key, the simplest way is to 
first concatenate the secret-key, e.g-, a 16-character (128 
bits) string [405] or [415] in ASCII, with the text so that 
p the remainder of the division is indeed a combination of the 



text and secret-key. Although PKZIP cipher is known to be weak 
(as reported by E. Biham and P.C. Kocher in *A Known Plaintext 
Attack on the PKZIP Encryption', K.U. Leuven Workshop on 
Cryptographic Algorithms, Springler-Verlag, 1995) the method 

0 is however convenient to generate a seed in the preferred 

embodiment of the invention since it is PRN generator, in this 
approach of the invention, to take care of one-wayness. 
Statistically, the probability of getting the same seed from 

\ two different texts is only of 1 over 2^^ or 4,294,967,296. 

5 With this particular text and key, digest [430] obtained with 
CRC-32 function and used as a seed for PRN described in figure 
5 is, in binary: 

bn 101001100111001011110011010110 0' 
or in decimal: 3,543,759,276. 



y 
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Figure 5 is an example of a pseudo- random-number (PRN) 
generator [500] suitable for the preferred embodiment of the 
invention. BBS (Blum, Blum and Shub) generator described in 
[SCH/Ch . 17 , 9] is one of the simplest and most efficient 
5 generator of its kind. It is said to be unpredictable to the 
left and unpredictable to the right that is, a cryptanalyst 
cannot predict neither the next bit in sequence nor the previ- 
ous one from the sole observation of a given sequence • In the 
example of this figure it is carried out as follows: 

10 Two large primes p and q are chosen which must be congruent 

to 3 modulo 4. For the sake of simplicity, because 
Mersenne' s numbers (i.e., prime numbers of the form 2^-1) 
are indeed all congruent to 3 modulo 4, following prime 
numbers are used in this example: 

15 p = 2^^-l = 2,305,843,009,213,693,951 and 

q = 2^^-l = 618,970,019,642,690,137,449,562,111 

and n, the product of p by q (a Blum integer), is: 

n = 1, 427, 247, 692, 705, 959, 880, 439, 315, 947, 500, 961, 989, 719, 490, 561 . 

Then, the seed [510] obtained at previous step is used to 
20 compute the starting value Xq [515] of the generator so 

that, Xo = seed^ modulo n with seed = 3,543,759,276. 
After which all successive internal values Xi [520] of 
generator are computed in the same way. That is, Xi = Xi_i^ 
modulo n. The i^^ pseudo-random bit is the LSB (Least 
25 Significant Bit) of Xi. Note that, because seed is a smaller 

number than either p or q, the requirement of having to 
choose the seed relatively prime with n, to have BBS working 
properly, is thus automatically fulfilled. 

Enough bits are generated so that enough valid numbers [525] 
30 (i.e., 7 in the example of figure 5) can be randomly withdrawn 
to insert extra-blanks (the number of extra blanks to be 
inserted is an input parameter that must be set for a particu- 
lar application) . Although alternate methods are possible a 
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simple way of achieving this is to index the inter-word blanks 
of the canonical text e.g., [530]. Depending on what particu- 
lar format of canonical text has been agreed on, one may have 
chosen to exclude some inter-word blanks though. As discussed 
5 in figure 2, the end-of-line blanks [531] could have been 

excluded. Whichever format is adopted for the canonical text 
enough bits [535] must be generated from the PRN generator to 
cover the whole range of blanks of the text (or whatever range 
is chosen) . In this example 7 bits (2^-1 = 127) are enough to 

10 cover the 71 word intervals of the text. Therefore, when 7 

bits have been generated a first number is derived [540] . For 
each next subsequent set of 7 bits a new number is obtained. 
However, if number is larger than 71 [550] or if a number 

\ repeats [545] , it is skipped. In this example the following 



M5 sequence is generated: 2 40 ^^r?- -iiS- 48 24 -ar&S -9^ 4-8- -&4^ 4 
41 iia- 17. When a 7^^ valid number is reached [525] (here with 
17) the generator is stopped since there are enough blanks to 
be inserted. 



f 
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Claims : 



What is claimed is: 

1, A method of marking a text document [100] through the 
insertion of inter-word blank characters, said method compris- 
ing the steps of: 

editing [110] the number of said inter-word blank charac- 
ters of said text document in order to conform to a model 
thus, obtaining a canonical text document [120] ; 

retaining, from said canonical text document, to further 
conform to said model, a subset of positions [230] of said 
inter-word blank characters, said subset of positions in 
which insertion of blank characters is permitted; 

computing, using said canonical text document [120] and a 
secret-key as inputs [130] , a unique combination of 
positions among said subset of positions; 

inserting into each position [151] of said unique combina- 
tion of positions at least one extra blank character thus, 
obtaining a marked text document [150] • 

2. The method according to claim 1 wherein said text document 
[100] is actually a said marked text dociament [150] to be 
authenticated by a recipient sharing said secret-key [130],- 
said method further comprising the step of: 

comparing [160] said text document [100] to said marked 
text document [150]; 

if matching exactly [161] : 

accepting said received text document as authentic; 
if not (162] : 

rejecting said received text document as fake. 
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3. The method according to claim 1 wherein said model calls 
for stripping all inter-word blank characters [110], in excess 
of one, off said text document, said model further retaining 
all said positions of said inter-word blank characters in said 

5 subset of positions, 

4 . The method according to any one of the preceding claims 
wherein said model calls for the insertion, into a soft-copy 
text document, of three blank characters [240] at each 
end-of-line, 

10 5. The method according to any one of the preceding claims 
wherein said model calls for excluding end-of-line blank 

characters [240] from said subset of positions* | 

6. The method according to any one of the previous claims 
wherein the number of inserted blanks to mark a said text 
15 document is set to reach a probability equal to or less than a 
predefined value of obtaining an identical said marked text 
document purely by chance. 

7- The method according to claim 1 wherein the step of comput- 
ing a unique combination of positions further includes the 
20 steps of: 

calculating a digest [342] uniquely representing said 
secret-key [330] combined with said canonical text [320]; 

deriving from said digest a plurality of randomly distrib- 
uted numbers [34 6] fitting in said subset of positions. 

25 8. The method according to claim 7 wherein the step of calcu- 
lating a digest is replaced by the step of: 

applying a hashing function [420] over said secret-key 
[415] concatenated with said canonical text [410] thus, 

obtaining a fixed-size keyed digest [430] . i 
FR 9 99 097 -19- 
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9. The method according to claim 7 wherein the step of deriv- 
ing a plurality of randomly distributed numbers further 
includes the steps of: 

indexing said subset of positions [530]; 

5 using said digest as a seed [510] of a PRN (pseudo-random- 

number) generator; 

operating said PRN generator; said step of operating said 
PRN generator further including the steps of: 

retaining those of said nuiubers that fit said indexing 

0 [540]; 

excluding duplicated said numbers [545] ; 

, keep operating said PRN generator till enough valid numbers 

1 are withdrawn [525] to match the number of blanks to be 
inserted. 

5 10. An authentication system, in particular a system for 

authenticating text document, comprising means adapted for 
carrying out the method according to any one of the previous 
claims . 



\ 

0 



11. A computer-like readable medium comprising instructions for 
carrying out the method according to any one of the claims 1 



to 9, 
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METHOD AND SYSTEM OF MARKING A TEXT DOCUMENT 
WXTH A PATTERN OF EXTRA BIiANKS 
FOR AUTHENTICATION 



Abstiract 



5 



The invention discloses how a text document can be marked 



through the insertion of inter-word blank characters for the 
purpose of becoming authenticateable . First, text to be marked 
is edited so as to obtain a canonical form of it conforming to 
a model. Then, from this canonical form of the text and a 

J) secret-key used as inputs, a unique combination of inter-word 
blank characters positions is computed in which extra blanks 
are inserted thus, obtaining a marked text document. Authenti- 
cation of a received marked text document is performed by a 
recipient, sharing the secret-key, further comparing the 

5 received text document to the marked text document so that if 
they are matching exactly the received text document is 
accepted as authentic or rejected as fake if not. The inven- 
tion allows to merge the information necessary to authenticate 

^ a text document into the body of the document itself which 

0 works as well on soft-copy and hard-copy text documents. 



Figure 1 . 



i 
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A message 

au^entleatxo^' code (MAC) , also knojor-Ts a data auther 
tication code (DAC) , is a one::5*a3rirash function wxth th 
e addition of a secret k^jr^e hash value is a functxc 
n of both the pre-ijatf^and the key. The theory is exac 
tlY the same as-^sh functions, except only someone wit 
h the key-^^^erif y the hash value. (Excerpt from 'Apii 
Li^etT ^C^vptography' by Bruce Schnei er. 2nd Ed., Wiley, Ij 
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-[410] 
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74 69 



75 6E 
20 73 
20 61 
61 67 
20 65 
74 69 
68 20 

76 61 
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6E 20 
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63 74 
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65 20 
78 61 



67 65 20 41 TJ.^ 68 

73 61 67^^65^^ 61 75 
^gl 6C 73 6P 20 

63 61 74 69 6F 6E 
61 79 20 68 61 73 68 

74 69 6F 6E 20 6F 66 
20 76 61 6C 75 65 20 



7^ 65 63 72 65 74 6B 65 79 25l 4D 65 73 73 11 

74 6& 6f ^ 20 43 tfc' U 6b Vi iA JlO 41 20 6D 65 73 
63 61 74 69 6F 6E 20 63 6F 64 65 20 28 4D 41 43 29 
61 73 20 61 20 64 61 74 61 20 61 75 74 68 65 6E 7>' 
28 44 41 43 29 2C 20 69 73 20 61 20 6F 6E6^ 
69 6P 6E 20 77 69 74 68 20 74 68 65 20,>«^64 64 69 

72 65 74 20 6B 65 79 2E 20 54 eSfi&^O 68 61 73 68 

75 6E 63 74 69 6F 6E 20 6F>iA?0 62 6F 74 68 20 74 ^- ^ 
61 6E 64 20 74 68 65 29.^5^3Q|| COdeCl (^16X3060-) 

63 74 6C 79 20 74..6r^5 20 73 

73 2C 20 65 65 70 74 20 6F 6E 6C 79 20 73 6F 
65 20 6B-^579 20 63 61 6E 20 76 65 72 69 66 79 20 

20 28 45 78 63 65 72 70 74 20 66 72 6P 6D 20 
/* 6F 67 72 61 70 68 79 27 20 62 79 20 42 72 75 63 
6E 64 20 45 64 2E 2C 20 57 69 6C 65 79 2C 20 31 39 
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11010011001110010111100110101100 

32-bit Digest used as Seed 

{3,543.759.276) 
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Figure 4 
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Xo= seed mod n 
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Pseudo-Random-Number 
Generator 

(Blum, Blum & Shub) 
[520K-^Xi = X mod n 

ith pseudo-random bit is LSB of Xi 
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